# Building a MatGraphDB Example with MPNearHull Data

In this notebook, we demonstrate how to build a materials graph database using the
[MatGraphDB](https://github.com/your/matgraphdb) framework with the MPNearHull dataset.

The steps include:
1. Importing required libraries and setting up configuration paths.
2. Downloading and extracting the dataset (and raw materials data if needed).
3. Creating a MatGraphDB instance.
4. Initializing node generators.
5. Initializing edge generators.
6. Verifying the database setup.

Follow along and run each cell to see how the database is constructed.

## Setup

In [3]:
import os
import shutil
import zipfile
import gdown

# Get the data directory from the config. You can change this to your own data directory.
DATA_DIR = os.path.join("..","..","data","examples","01")
# Define the path to store the raw materials data.
MATERIALS_PATH = os.path.join(DATA_DIR, "material")

MATGRAPHDB_PATH = os.path.join(DATA_DIR, "MatGraphDB")

# Define the dataset URLs.
DATASET_URL = "https://drive.google.com/uc?id=1zSmEQbV8pNvjWdhFuCwOeoOzvfoS5XKP"

# Define the URL for the raw materials data.
RAW_DATASET_URL = "https://drive.google.com/uc?id=14guJqEK242XgRGEZA-zIrWyg4b-gX5zk"  # (Not used below but available)

# # Define the path to store the raw materials data.
# RAW_DATASET_ZIP = os.path.join(config.data_dir, "raw", "MPNearHull_v0.0.1_raw.zip")

# # Define the path to store the dataset.
# DATASET_ZIP = os.path.join(config.data_dir, "datasets", "MPNearHull_v0.0.1.zip")

print("Library imports and paths are set.")


Library imports and paths are set.


### Define Function for Downloading and Extracting Data

In [None]:
def download_raw_materials(mp_materials_path):
    """
    Download and extract the raw materials data if it is not already present.
    """
    if not os.path.exists(mp_materials_path):
        
        os.makedirs(mp_materials_path, exist_ok=True)
        print("Downloading raw materials data...")
        
        raw_dataset_zip = os.path.join(mp_materials_path, "MPNearHull_v0.0.1_raw.zip")
        # Note: Here we use DATASET_URL as in the original code.
        gdown.download(DATASET_URL, output=raw_dataset_zip, quiet=False)
        
        print("Extracting raw materials data...")
        with zipfile.ZipFile(raw_dataset_zip, "r") as zip_ref:
            zip_ref.extractall(mp_materials_path)
        
        
        files=os.listdir(mp_materials_path)
        os.remove(raw_dataset_zip)
        mp_nearhull_path = os.path.join(mp_materials_path, "MPNearHull")
        tmp_materials_path = os.path.join(mp_nearhull_path, "nodes", "material")
        materials_files = os.listdir(tmp_materials_path)
        for file in materials_files:
            shutil.move(os.path.join(tmp_materials_path, file), os.path.join(mp_materials_path, file))
            
        shutil.rmtree(mp_nearhull_path)
        print("Raw materials data ready!")
        
# Optionally, download the raw materials data if you plan to initialize from raw files.
if not os.path.exists(MATERIALS_PATH):
    download_raw_materials(MATERIALS_PATH)
else:
    print("Raw materials data already exists.")

Downloading raw materials data...


Downloading...
From (original): https://drive.google.com/uc?id=1zSmEQbV8pNvjWdhFuCwOeoOzvfoS5XKP
From (redirected): https://drive.google.com/uc?id=1zSmEQbV8pNvjWdhFuCwOeoOzvfoS5XKP&confirm=t&uuid=5bcba796-ff8e-4bb3-bc09-39d3f1136dc1
To: c:\Users\lllang\Desktop\Current_Projects\MatGraphDB\examples\notebooks\materials\MPNearHull_v0.0.1_raw.zip
100%|██████████| 632M/632M [00:11<00:00, 53.6MB/s] 


Extracting raw materials data...
Raw materials data ready!


## Initialization

### Initialize a Materials Store

In [4]:
from matgraphdb import MaterialStore

materials_store = MaterialStore(storage_path=MATERIALS_PATH)
print(materials_store)

NODE STORE SUMMARY
Node type: material
• Number of nodes: 80643
• Number of features: 136
Storage path: ..\..\data\examples\01\material


############################################################
METADATA
############################################################
• class: MaterialStore
• class_module: matgraphdb.materials.nodes.materials
• node_type: material
• name_column: id

############################################################
NODE DETAILS
############################################################



## Initialize a MatGraphDB Instance

In [5]:

from matgraphdb import MatGraphDB

if not os.path.exists(MATGRAPHDB_PATH):
    shutil.rmtree(MATGRAPHDB_PATH)
mdb = MatGraphDB(storage_path=MATGRAPHDB_PATH,materials_store=materials_store)

print(mdb)

GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples\01\MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 1
------------------------------------------------------------
• Node type: material
  - Number of nodes: 80643
  - Number of features: 136
  - db_path: ..\..\data\examples\01\MatGraphDB\nodes\material
------------------------------------------------------------

############################################################
EDGE DETAILS
#########

## Adding Nodes

In this section, we will add the nodes to the MatGraphDB instance. We will be using some of the built-in node generators to add the nodes to the MatGraphDB instance.

In [6]:
from matgraphdb.materials.nodes import (
    element, chemenv, crystal_system, magnetic_state, 
    oxidation_state, space_group, wyckoff, material_site, material_lattice
)

# Here we define the generator functions and arguments if they are needed. 
# For instance, to get the materials sites and lattices, we need to pass the materials store to the generator function.
node_generators = [
    {"generator_func": element},
    {"generator_func": chemenv},
    {"generator_func": crystal_system},
    {"generator_func": magnetic_state},
    {"generator_func": oxidation_state},
    {"generator_func": space_group},
    {"generator_func": wyckoff},
    {
        "generator_func": material_site,
        "generator_args": {"material_store": mdb.node_stores["material"]},
    },
    {
        "generator_func": material_lattice,
        "generator_args": {"material_store": mdb.node_stores["material"]},
    },
]


Now we can add the node generators to the MatGraphDB instance. When we add the generator, it will immediately execute and add the nodes to the database.

In [7]:
# Add each node generator to the database.
for generator in node_generators:
    generator_func = generator.get("generator_func")
    generator_args = generator.get("generator_args", None)
    print(f"Adding node generator: {generator_func.__name__}")
    mdb.add_node_generator(generator_func=generator_func, generator_args=generator_args)

print("Node generators have been initialized.")

print(mdb)


Adding node generator: element
Adding node generator: chemenv
Adding node generator: crystal_system
Adding node generator: magnetic_state
Adding node generator: oxidation_state
Adding node generator: space_group
Adding node generator: wyckoff
Adding node generator: material_site
Adding node generator: material_lattice
Node generators have been initialized.
GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples\01\MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total 

## Adding Edges

In this section, we will add the edges to the MatGraphDB instance. We will be using some of the built-in edge generators to add the edges to the MatGraphDB instance.

In [8]:
from matgraphdb.materials.edges import (
    material_element_has,
    material_lattice_has,
    material_spg_has,
    element_element_neighborsByGroupPeriod,
    element_element_bonds,
    element_oxiState_canOccur,
    material_chemenv_containsSite,
    material_crystalSystem_has,
    element_chemenv_canOccur,
    spg_crystalSystem_isApart,
)



# List of edge generator configurations.
edge_generators = [
    {
        "generator_func": element_element_neighborsByGroupPeriod,
        "generator_args": {"element_store": mdb.node_stores["element"]},
    },
    {
        "generator_func": element_oxiState_canOccur,
        "generator_args": {
            "element_store": mdb.node_stores["element"],
            "oxiState_store": mdb.node_stores["oxidation_state"],
        },
    },
    {
        "generator_func": material_chemenv_containsSite,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "chemenv_store": mdb.node_stores["chemenv"],
        },
    },
    {
        "generator_func": material_crystalSystem_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "crystal_system_store": mdb.node_stores["crystal_system"],
        },
    },
    {
        "generator_func": material_element_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "element_store": mdb.node_stores["element"],
        },
    },
    {
        "generator_func": material_lattice_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "lattice_store": mdb.node_stores["material_lattice"],
        },
    },
    {
        "generator_func": material_spg_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "spg_store": mdb.node_stores["space_group"],
        },
    },
    {
        "generator_func": element_chemenv_canOccur,
        "generator_args": {
            "element_store": mdb.node_stores["element"],
            "chemenv_store": mdb.node_stores["chemenv"],
            "material_store": mdb.node_stores["material"],
        },
    },
    {
        "generator_func": spg_crystalSystem_isApart,
        "generator_args": {
            "spg_store": mdb.node_stores["space_group"],
            "crystal_system_store": mdb.node_stores["crystal_system"],
        },
    },
    {
        "generator_func": element_element_bonds,
        "generator_args": {
            "element_store": mdb.node_stores["element"],
            "material_store": mdb.node_stores["material"],
        },
    },
]


# Add each edge generator to the database and run them immediately.
for generator in edge_generators:
    generator_func = generator.get("generator_func")
    generator_args = generator.get("generator_args", None)
    print(f"Adding edge generator: {generator_func.__name__}")
    mdb.add_edge_generator(generator_func=generator_func, generator_args=generator_args, run_immediately=True)

print("Edge generators have been initialized.")
print(mdb)

Adding edge generator: element_element_neighborsByGroupPeriod
Adding edge generator: element_oxiState_canOccur
Adding edge generator: material_chemenv_containsSite
Adding edge generator: material_crystalSystem_has
Adding edge generator: material_element_has
Adding edge generator: material_lattice_has
Adding edge generator: material_spg_has
Adding edge generator: element_chemenv_canOccur
Adding edge generator: spg_crystalSystem_isApart
Adding edge generator: element_element_bonds
Edge generators have been initialized.
GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples

## Verifying the Database


In [9]:
print(mdb)

GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples\01\MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 10
------------------------------------------------------------
• Node type: material
  - Number of nodes: 80643
  - Number of features: 136
  - db_path: ..\..\data\examples\01\MatGraphDB\nodes\material
------------------------------------------------------------
• Node type: element
  - Number of nodes: 118
  - Number of features: 99
  - db_pat